small weight
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Pay Attention to Small Weights
Zhou, Chao, Jacobs, Tom, Gadhikar, Advait, Burkholz, Rebekka
Finetuning large pretrained neural networks is known to be resource-intensive, both in terms of memory and computational cost. To mitigate this, a common approach is to restrict training to a subset of the model parameters. By analyzing the relationship between gradients and weights during finetuning, we observe a notable pattern: large gradients are often associated with small-magnitude weights. This correlation is more pronounced in finetuning settings than in training from scratch. Motivated by this observation, we propose NANOADAM, which dynamically updates only the small-magnitude weights during finetuning and offers several practical advantages: first, this criterion is gradient-free -- the parameter subset can be determined without gradient computation; second, it preserves large-magnitude weights, which are likely to encode critical features learned during pretraining, thereby reducing the risk of catastrophic forgetting; thirdly, it permits the use of larger learning rates and consistently leads to better generalization performance in experiments. We demonstrate this for both NLP and vision tasks.
- North America > United States (0.14)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Reviews: Extracting Relationships by Multi-Domain Matching
Title: Extracting Relationships by Multi-Domain Matching Summary Assuming that a corpus is compiled from many sources belonging to different to domains, of which only a strict subset of domains is suitable to learn how to do prediction in a target domain, this paper proposes a novel approach (called Multiple Domain Matching Network (MDMN)) that aims at learning which domains share strong statistical relationships, and which source domains are best at supporting to learn the target domain prediction tasks. While many approaches to multiple-domain adaptation aim to match the feature-space distribution of *every* source domain to that of the target space, this paper suggests to not only map the distribution between sources and target, but also *within* source domains. The latter allows for identifying subsets of source domains that share a strong statistical relationship. Strengths Paper provides a theoretical analysis that yields a tighter bound on the weighted multi-source discrepancy. Weaknesses Tighter bound on multi-source discrepancy depends on the assumption that source domains that are less relevant for the target domain have lower weights.
Neural Computing with Small Weights
An important issue in neural computation is the dynamic range of weights in the neural networks. Many experimental results on learning indicate that the weights in the networks can grow prohibitively large with the size of the inputs. Here we address this issue by studying the tradeoffs between the depth and the size of weights in polynomial-size networks of linear threshold elements (LTEs). We show that there is an efficient way of simulating a network of LTEs with large weights by a network of LTEs with small weights. In particular, we prove that every depth-d, polynomial-size network of LTEs with exponentially large integer weights can be simulated by a depth-(2d 1), polynomial-size network of LTEs with polynomially bounded integer weights.
A Tighter Bound for Graphical Models
Leisink, Martijn A. R., Kappen, Hilbert J.
The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.
- Europe > Netherlands > Gelderland > Nijmegen (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Middle East > Jordan (0.04)
A Tighter Bound for Graphical Models
Leisink, Martijn A. R., Kappen, Hilbert J.
The neurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.
- Europe > Netherlands > Gelderland > Nijmegen (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Middle East > Jordan (0.04)
A Tighter Bound for Graphical Models
Leisink, Martijn A. R., Kappen, Hilbert J.
Theneurons in these networks are the random variables, whereas the connections between them model the causal dependencies. Usually, some of the nodes have a direct relation with the random variables in the problem and are called'visibles'. The other nodes, known as'hiddens', are used to model more complex probability distributions. Learning in graphical models can be done as long as the likelihood that the visibles correspond to a pattern in the data set, can be computed. In general the time it takes, scales exponentially with the number of hidden neurons.
- Europe > Netherlands > Gelderland > Nijmegen (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Middle East > Jordan (0.04)
On Neural Networks with Minimal Weights
Bohossian, Vasken, Bruck, Jehoshua
Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
On Neural Networks with Minimal Weights
Bohossian, Vasken, Bruck, Jehoshua
Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)